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Abstract 



A new estimation method for the two-component mixture model introduced in Van 



dekerkhove ( 2012 ) is proposed. This model, which consists of a two-component mixture 



of linear regressions in which one component is entirely known while the proportion, 
the slope, the intercept and the error distribution of the other component are unknown, 
seems to be of interest for the analysis of large datasets produced from two-color ChlP- 
chip high-density microarrays. In spite of good performance for datasets of reasonable 



size, the method proposed in Vandekerkhove (2012) suffers from a serious drawback 



when the sample size becomes large, as it is based on the optimization of a contrast 
function whose pointwise computation requires O(n^) operations. The range of appli- 
cability of the method derived in this work is substantially larger as it is based on a 
method-of-moment estimator whose computation only requires 0{n) operations. From 
a theoretical perspective, the asymptotic normality of both the estimator of the Eu- 
clidean parameter vector and of the semiparametric estimator of the c.d.f. of the error 
is proved under weak conditions not involving the zero-symmetry assumption typically 
used this last decade. The finite-sample performance of the latter estimators is studied 
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under various scenarios through Monte Carlo experiments. From a more practical per- 
spective, the proposed method is applied to the tone data analyzed, among others, by 



Hunter and Young (2012), and to the ChlPmix data studied by Martin-Magniette et al. 



( 2008 ) . An extension of the considered model involving an unknown scale parameter 
for the first component is discussed in the final section. 



1 Introduction 



Practitioners are frequently interested in modeling the relationship between a random re- 
sponse variable Y and a d-dimensional random explanatory vector X by means of a linear 
regression model estimated from a random sample (Xj,yj)i<j<„ of {X,Y). Quite often, the 
homogeneity assumption claiming that the linear regression coefficients are the same for all 
the observations (Xi, Yi), . . . , Yn) is inadequate. To allow different parameters for dif- 
ferent groups of observations, a Finite Mixture of Regressions (FMR) can be considered; see 



Leisch (2004) and Griin and Leisch (2006) for a nice overview 



Statistical inference for the fully parametric FMR model was first considered by Quandt 



and Ramsey (1978) who proposed an estimation method based on the moment generating 



function. An EM estimating approach was proposed by De Veaux ( 1989 ) in the case of two 



components. Variations of the latter approach were also considered in Jones and McLach- 



Ian 


( 


1992 


) and 


Turner 


( 


2000 


)■ 



Hawkins et al. (2001) studied the problem of determining 



the number of components in the parametric FMR model using methods derived from the 
likelihood equation. In Hurn et al. (2003), the authors proposed a Bayesian approach to 
estimate the regression coefficients and also considered an extension of the model in which 
the number of components is unspecified. Zhu and Zhang (2004) established the asymp- 



totic theory for maximum likelihood estimators in parametric FMR models. More recently, 
Stadler et al. (2010) proposed an £i-penalized method based on a Lasso- type estimator for 
a high- dimensional FMR model with d ^ n. 

As an alternative to parametric approaches to the estimation of a FMR model, some 
authors suggested the use of more flexible semiparametric approaches. This research direc- 



tion finds its origin in the work of Hall and Zhou (2003) in which d-variate semiparametric 



mixture models of random vectors with independent components were considered. These au- 
thors showed in particular that, for (i > 3, it is possible to identify a two-component model 
without parametrizing the distributions of the component random vectors. To the best of 
our knowledge, Leung and Qin (2006) were the first to estimate a FMR model semipara- 



metrically. In the two-component case, they studied the situation in which the components 



are related by Anderson (1979)'s exponential tilt model. Hunter and Young (2012) studied 



the identifiability of an m-component semiparametric FMR model and numerically investi- 



gated an EM algorithm for estimating its parameters. Vandekerkhove (2012) proposed an 
M-estimation method for a two-component semiparametric mixture of regressions with sym- 
metric errors in which one component is known. The latter approach was applied to data 



extracted from a high-density microarray and modeled in Martin-Magniette et al. (2008) by 
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means of a parametric FMR. The semiparametric approach of Vandekerkhove (2012) is of 



interest for two main reasons. Due to its semiparametric nature, the method allows to detect 
complex structures in the error of the unknown regression component. It can additionally 
be regarded as a tool to assess the relevance of the usual EM-type Euclidean parameter 
estimation. Its main drawbacks however are that it is not theoretically valid when the errors 
are not symmetric and that its use is very computationally expensive for large datasets as it 
requires the optimization of a contrast function whose pointwise evaluation requires Oln"^) 
operations. 



The object of interest of this paper is the two-component FMR model studied by Vandek- 



erkhove (2012) in which one component is entirely known while the proportion, the slope, the 
intercept and the error distribution of the other component are unknown. The estimation 
of the Euclidean parameter vector is achieved through a method of moments. Semipara- 
metric estimators of the c.d.f. and the p.d.f. of the error of the unknown component are 
proposed. The proof of the asymptotic normality of the Euclidean and functional estimators 
is not based on zero-symmetry-like assumptions frequently found in the literature but only 
involves finite moments of order eight for the explanatory variable and the boundness of the 
p.d.f.s of the errors and their derivatives. The almost sure uniform consistency of the estima- 
tor of the p.d.f. of the unknown error is obtained under similar conditions. A consequence of 
these theoretical results is that, unlike for EM-type approaches, the estimation uncertainty 
can be assessed through large-sample standard errors for the Euclidean parameters and by 
means of an approximate confidence band for the c.d.f. of the unknown error. The latter is 
computed using an unconditional weighted bootstrap whose asymptotic validity is proved. 

From a practical perspective, it is worth mentioning that the range of applicability of 
the resulting semiparametric estimation procedure is substantially larger than the one of 



Vandekerkhove (2012) as its computation only requires 0{n) operations. As a consequence. 



very large datasets can be easily processed. For instance, as shall be seen in Section |6| the 
estimation of the parameters of the model from the ChlPmix data considered in |Martin- 
Magniette et al. (2008) consisting of n = 176,343 observations took less than 30 seconds 
on one 2.4 GHz processor. The estimation of the same model from a subset of n = 30, 000 



observations using the method of Vandekerkhove (2012 ) took more than two days on a similar 
processor. 

The paper is organized as follows. Section |2] is devoted to a detailed description of the 
model, while Section [3] is concerned with its identifiability through the moment method. The 
estimators of the Euclidean parameter vector and of the functional parameter are described 
in detail in Section |4j The finite-sample performance of the proposed estimation method is 
studied for various scenarios through Monte Carlo experiments in Section |5| In Section [6} 
the proposed method is applied to the tone data analyzed, among others, by IHunter and 



Young (2012), and to the ChlPmix data considered in Martin-Magniette et al. 



(2008). An 



extension of the FMR model under consideration involving an unknown scale parameter for 
the first component is discussed in the final section. 
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2 Problem and notation 



Let Z he a. Bernoulli random variable with unknown parameter ttq G [0, 1], let X be an X- 
valued random variable with C M, and let e*, e** be two absolutely continuous centered real 
valued random variables with finite variances and independent of X. Assume additionally 
that Z is independent of X, e* and e**. Furthermore, for fixed a^, /3q, a^*, (31* e M, let Y be 
the random variable defined by 

F = (1 - Z){al + (3*X + e*) + Z{a** + (3**X + e**), 



Y 



a*o + I3*X + e* if Z = 0, 
a*Q* + (3**X + £** if Z = 1. 

The above display is the equation of a mixture of two linear regressions with Z as mixing 
variable. 

Let F* and F** denote the c.d.f.s of e* and e**, respectively. Furthermore, Oq, (3q and F* 
are assumed known while a", (3q*, ttq and F** are assumed unknown. The aim of this work is 
to propose and study an estimator of {a^* , (3^* ,hq, F**) based on n i.i.d. copies (Xj,yj)i<j<„ 
of (X, Y). Now, define Y = Y -a^- (3qX, = a^* - and /3o = /Sq* - (3q, and notice that 



Y 



e* a Z = 0, 

ao + (3oX + e if Z = 1, 



where, to simplify the notation, e = e** and F = F**. It follows that the previous estimation 
problem is equivalent to the problem of estimating {ao, /3o; ttq, F) from the observation of n 
i.i.d. copies {Xi,Yi)i<i<n of {X,Y). 



As we continue, the unknown c.d.f.s of X and Y will be denoted by Fx and Fy, respec- 
tively. Also, for any x & X, the conditional c.d.f. of Y given X = x will be denoted by 
Fy\x{-\x), and we have 

FYix{y\x) = {l-7io)F*{y) + 7ioF{y-ao-f3ox), y eR. (2) 

It follows that, for any x G X, fY\x{-\x), the conditional p.d.f. of Y given X = x, can be 
expressed as 

fY\x{y\x) = {l-7ro)r{y) + rtof{y-ao-l3ox), y eR, (3) 
where /* and / are the p.d.f.s of e* and e, assuming that they exist on M. 

Note that, as shall be discussed in Section [7[ it is possible to consider a slightly more 
general version of this model involving an unknown scale parameter for the first component. 
This more elaborate model remains identifiable and estimation through the moment method 
is theoretically possible. However, from a practical perspective, estimation of this scale 
parameter through the moment method seems quite unstable insomuch as that an alternative 
estimation method appears required. 
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3 Identifiability through the moment method 

Since ([T]) is clearly equivalent to 

r = (1 - Z)e* + Z(ao + l^oX + e), (4) 

we immediately obtain that 

E(r|X) = TToao + 7ro/3oX a.s. (5) 

It follows that the coefficients 70,1 = TToao and 70,2 = t^oI^o can be identified from ([s]) if is 
not reduced to a singleton. In addition, we have 

E(F2|x) = Z)e* + Z{ao + I3^X + e)Y\X] a.s. 

= E(l-Z)E{(e*)2} + E(Z)E{(ao + /3o^)^ + e^|X} a.s. 

= (l-7ro)(a*)2 + 7ro(a2 + 2ao/3o^ + /3o^' + c^o) a.s. 

= (l-7ro)((To)^ + vro(ao + ^o) + 2vroao/3o^ + vro/3o^^ a.s., (6) 

where ctq and ctq are the standard deviations of e* and £, respectively. If X contains three 
points Xi, X2, such that the vectors {(1, a;i, a;f), (1, X2, ^2), (1, 0:3, x\)} are linearly indepen- 
dent then, from ([6|, we can identify the coefficients 70,3 = (1 — 7ro)(crQ)^ + 7ro(a;Q + ctq), 
7o,4 = 27roao/3o and 70,5 = ttq/^q. In other words, under the aforementioned conditions on X , 
we have 

70.1 = TToao 

70.2 = 7ro/3o 

< 70,3 = (l-vro)(a*)2 + 7roK^a2) (7) 

7o,4 = 27roao/3o = 2ao7o,2 
, 7o,5 = 7ro/3^ = /3o7o,2- 

From the above system of equations, we see that ao, and ttq can be identified provided 
7i"o/3o 7^ 0, that is, provided the unknown component actually exists and its slope is non zero. 
The latter condition will be assumed to hold in the rest of the paper. 

Let us now consider the functional part F of the model. For any 77 = (a, /3) G M^, denote 
by J{-,r]) the c.d.f. defined by 

J(t,r7) = Pr(r < t + a + /3X), t G M. (8) 

For any t G M, this can be rewritten as 

J{t,'n)= / FY\x{t + a + (3x\x)dFx{x) 
Jr 

= (1 - TTo) I F*{t + a + f3x)dFx{x) + ttq / F{t + (a - ao) + (/3 - f3o)x}dFx{x). 
Jm. Jr 
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For T] = tJq = (ao, (3o), we then obtain 

J(t,T7o) = (1 - TTo) / F*{t + ao + (3ox)dFx{x) + noF{t), t E 



Now, for any r] G M^, let K{-, rf) be defined by 



K{t,rj) 



F*{t + a + l3x)dFx{x), te 



(9) 



It follows that F is identified since 



Fit) = {J{t, r7o) - (1 - 7ro)ir(t, rjo)} ^ ^ ^ M. 
The above equation is at the root of the derivation of an estimator for F. 



(10) 



4 Estimation 



Let P be the probability distribution of (X, Y). For ease of exposition, we will frequently use 



the notation adopted in the theory of empirical processes in the sense of [van der Vaart and 

k 



Wellner (2000) or Kosorok (2008) for instance. Given a measurable function / : 



for some integer k > 1, Pf will denote the integral J fdP. Also, the empirical measure 
obtained from the random sample (Xj,Fj)i<j<„ will be denoted by = n~^Yl 



1 ^X„Y,, 



where 6x,y is the probability distribution that assigns a mass of 1 at (x, y). The expectation 
of / under the empirical measure is then P„/ = Y17=i fi^iy ^j) the quantity G„/ = 
y/niFnf — Pf) is the empirical process evaluated at /. The arrow '-w' will be used to denote 



weak convergence in the sense of Definition 1.3.3 of van der Vaart and Wellner (2000 ) and, for 
any set S, i°°{S) will stand for the space of all bounded real- valued functions on S equipped 



with the uniform metric. Key results and more details can be found for instance in van der 



Vaart (1998), van der Vaart and Wellner (2000) and Kosorok (2008). 



4.1 Estimation of the Euclidean parameter vector 

To estimate the Euclidean parameter vector (ao, /3o, tcq) G M x M \ {0} x (0, 1], we first need 
to estimate the vector 7q = (70,1, . . . , 70,5) G whose components were expressed in terms 
of ao, /So and ttq in the previous section. From ([s]) and (|6]), it is natural to consider the 
regression function 



7 e 



where, for any 7 G 



is defined by 



2\2 



^-/{x, y) = (y-ii- i2x) +{y - 73 - 74a; - 75a; ) 



x,y e 
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As an estimator of we then naturally consider 7„ = argmin-^ dni'j) that satisfies 

where ip-y, the gradient of ip^ with respect to 7, is given by 

x{y-ii- i2x) 

^f{x,y) = -2 



x{y^ - 73 - 74a; - 753;^) 
\ x^{y^ - 73 - 74a; - 753;^) / 



x,y e 



Now, for any integers p,q > 1, define 



xpyi 



1 " 

i=l 



and let 



/ 1 


X 








\ 


X 



















1 


X 










X 


X2 


JO 


V 





X2 


X3 


X4 / 



and Or 



( x_ \ 

XY 
Y2 



XY^ 
\ XW^ ) 



which respectively estimate 

/ 1 E(X) 

E(X) E(X2) 





\ 






1 






E(X) E(X2) 

E(X) E(X2) E(X3) 

E(X2) E(X3) E(X4) / 



and ^0 = 2 



E(xr) 
E(r2) 

E(XF2) 

\ E(x2r2) y 



The linear equation Pn07„ = can then equivalently be rewritten as r„7„ = Provided 
the matrices r„ and Fq are invertible, we can write 7„ = r~^0„ and 7q = Fq ^^q- 

To obtain an estimator of (ao, /So? ttq), we use the relationships induced by ([s]) and ([6]) and 
recalled in ([T]). Leaving the third equation aside because it involves the unknown standard 
deviation ctq of e, we obtain three possible estimators of oq: 



7n,l7n,5 
2 ' 
7n,2 



7n,4 
27n,2 



or 



Or 



7',4 



47n,l7n,5 ' 



three possibles estimators of /3o: 

7n,5 



/3. 



7n,2 



/3n 



7n,4 
27n,l 



or 



/3n 



7n,27n,4 



2 ' 



47n,57n,l 
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and, three possibles estimators of ttq: 



7r„, = 



7n,5 



TTr,, = 



2 7n,l7^ 
7n,4 



n,2 



or 



7r„. = 



4 7n,l7«,5 
7n,4 



There are therefore 27 possible estimators of («0) /^Oi ttq). Their asymptotics can be obtained 
under very reasonable conditions. Unfortunately, all 27 estimators turned out to behave 
quite poorly in small samples. This prompted us to look for alternative estimators within 
the "same class" . 

We now describe an estimator of (cto, /^o, '''"o) that was obtained empirically and that be- 
haves significantly better for small samples than the aforementioned ones. The new regression 
function under consideration is ^^(7) = Pn</?7, 7 £ K^, where, for any {x,y) e R^, 

^^{x, I/) = (1/ - 71 - 72a;)2 ^ ^^^2^2 + (a;2 _ + ^^.s _ ^ ^^4 _ ^^y^ 



Now, let 



/I A 
X 










V 














1 








\ 



X2 

X4 

10 

10 

1 

1 / 



which respectively estimate 



/ 1 E(X) 



v 

























1 












E(X2) 
E(X^) 











and 



\ 







10 

10 

10 

1 y 



and 



XY 
Y2 



X2Y2 

X 

V ^ J 



( 

E(xy) 

E(F2) 

E(x2r2) 

E(X) 

E(X2) 

E(X3) 
\ E(X4) 



Then, proceeding as previously, provided the matrices r„ and Fq are invertible, the estimator 
7„ — argmin-), (i„(7) of 7o = Tq^^q is given by 7„ = r~^0„. To obtain an estimator of 
(cco, /5o) ''''o)) we have, from the second term of the regression function, that 



70,4 = 



C0v(X2,y2) c0v(X2,y2) 



V(X2) 



7o,8 - 7o,6 
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where the second equahty comes from the fact that 70,6 = lE(X^) and 70,8 = lE(X^). Now, 
using Q, we find 

cov{X\Y^) = 7ro/3o'V(X2) + 27roao/3ocov(X^ X), 
which, combined with the fact that 70,1 = vroao and 70,2 = ttq/So, gives 

cov(X^ Y^) = 7o,2/3o(7o,8 - 7o,6) + 27o,i/3o(7o,7 - 7o,57o,6)- 
This leads to the following estimator of (ao, f3o, ttq): 

/3n = /(7„) ^"'^ 



ln,2 + 27„,i(7„,7 - 7„,57n,6)/ (7n,8 " llfi) ' 
7n,2 

7n,l 



As we continue, the subsets of on which the functions (yf", g'^ and (7'^ exist and are 
differentiable will be denoted by P", and V", respectively, and D"'^''^ will stand for 

To derive the asymptotic behavior of the estimator (3n, 7r„) = (fi'"(7„), 5'^(7„), fi''^(7n)), 
we consider the following assumptions: 

Al. (i) X has a finite fourth order moment; (ii) X has a finite eighth order moment. 
A2. V(X) > and V(X2) > 0. 

Clearly, Assumption Al (ii) implies Assumption Al (i), and Assumption A2 implies that the 
matrix Tq is invertible. 

The following result, proved in Appendix |A| characterizes the asymptotic behavior of the 
estimator (a„, 7r„). 

Proposition 4.1. Assume that 70 ^ D"'^''^. 

(i) Under Assumptions Al (i) and A2, (a„,/3„,7r„) (ao, /3o, tto). 

(ii) Suppose that Assumptions Al (ii) and A2 are satisfied and let \E'-y he the 3 by 8 matrix 
defined by 

/ 99l ... 99l \ 



^7 



971 ^78 
y ^ ... . 



(7), 7 e P 



a, (3,17 



Then, 

^(an - ao, I3n - /3o, 7r„ - ttq) = -G„(^-),j,ro V70) + op(l)- 
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As a consequence, ^Jn[an — cto? /^n ~ /^o, T^n ~ ttq) converges in distribution to a centered 
normal random vector with covariance matrix S = '^~f^TQ^P{Lp^^(^^^)TQ^^^^, which 
can be consistently estimated by E„ = \l/-),^r~^P„(0-y^^0]^ )r~^'^J. in the sense that 



An immediate consequence of the previous result is that large-sample standard errors of 
an, (3n and 7r„ are given by the square root of the diagonal elements of the matrix n~^S„. 
The finite-sample performance of these estimators is investigated in Section [5] and they are 
used in the illustrations of Section [6l 



4.2 Estimation of the functional parameter 



To estimate the unknown c.d.f. F of e, it is natural to start from (10). For a known t] = 
(q!,/9) e M^, the term Ji-^T]) defined in ([s]) may be estimated by the empirical c.d.f. of the 
random sample {Yi — a — /3Xj)i<j<„, i.e., 

1 

Jn{t,r]) = -Yl{Y,-a- pX,<t), teR. 

i=l 

Similarly, since F* (the c.d.f. of e*) is known, a natural estimator of the term K{t, rf) defined 
in ^ is given by the empirical mean of the random sample {F*{t + a + /3Xj)}i<j<„, i.e., 

1 " 

Kn{t,r,) = -Y^F*{t + a + PX,:), teR. 

To obtain estimators of J(-,r7o) and K{-,r]Q), it is then natural to consider the plug-in 
estimators Jn{-,'n^) and Kn{-,r}^), respectively, based on the estimator rj^ = (q;„,/3„) = 
{.9°" ^ 9^){ln) of ^0 proposed in the previous subsection. 

We shall therefore consider the following nonparametric estimator of F : 

Fnit) = — { J„(t, 77 J - (1 - 7r„)i^n(t, rj^)} , teR. (11) 



Note that F„ is not necessarily a c.d.f. as it is not necessarily increasing and can be 
smaller than zero or greater than one. In practice, we shall consider the partially corrected 
estimator (F„ V 0) A 1, where V and A denote the maximum and minimum, respectively. 

To derive the asymptotic behavior of the previous estimator, we consider the following 
additional assumptions on the p.d.f.s /* and / of e* and e, respectively: 



A3, (i) /* and / exist and are bounded on M; (ii) (/*)' and /' exist and are bounded on M. 
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Before stating one of our main results, let us first define some additional notation. Let 
T"^ and be two classes of measurable functions from to M defined respectively by 

= { {x, y) ^ ^l^[x, y) = l(y - a - /3x < t) : t G M, 77 = (a, /3) e } 

and 

jr^ = [[x^y] ^ tpf^{x,y) = F*{t + a + (3x) : t G M, 77 = (a, /3) G M^} . 

Furthermore, let D^^^-'^ be a bounded subset of D^-'^''^ containing 7g, and let J^"''^''^ be the 
class of measurable functions from to M.^ defined by 

= ^ -vI/^Fo V7(^>1/) = {i^^{x,y),i^^ix,y),^;ix,y)) : 7 G ^^^f ^} • 

With the previous notation, notice that, for any t G M, 

VnUnit^Vo) - J{t,Vo)} =^n^ir,o ^ud ^/^{Kr,{t, rj^) - K {t, TJo)} = Gnlpl^r,^^ 



and that, under Assumptions Al (ii) and A2, Proposition 4.1 states that 



n {an - ao, /3„ - /3o, 7r„ - ttq) = G„ (^?/^^^, ?/^^^, j + op(l). 



Next, for any 7 G P^'^''', let 



7o 

1 1 — TT P?/'-'^ — Plh-^ 

C = + + mm)^', + , "" V';, (12) 

' ' TT ' ' ' ' TT ' ' TT^ ' 



with rj = (a, (3) = (^",/)(7) and tt = ^^(7). 

The following result, proved in Appendix [B| gives the weak limit of the empirical pro- 
cess y/n{Fn — F). 



Proposition 4.2. Assume that 7q G D"'^''" and that Assumptions Al, A2 and A3 hold. 
Then, for any t G M, 

V^{F„(t) - F{t)} = 'G.ni'l^^ + Qn,U 

where sup^gjg \Qn,t\ = op(l), and the empirical process t t-)- Gn'ipf^j^ converges weakly to 
t ^ Gipf^^^ m 

Let us now discuss the estimation of the p.d.f. / of e. Starting from ( [lO| and after 
differentiation, it seems sensible to estimate the expectation K{f*{t + cto + /3o^)}, t G M, 
by the empirical mean of the observable sample {f*{t + a„ + /3nXj)}i<j<„. Hence, a natural 
estimator of / can be defined, for any t G M, by 

^ t ^ (™^) - E . ..X, } , ,13) 
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where k is a kernel function on M and {hn)n>i is a sequence of bandwidths converging to 
zero. 

In the same way that -F„ is not necessarily a c.d.f., /„ is not necessarily a p.d.f. In 
practice, we shall use the partially corrected estimator /„ V 0. A fully corrected estimator 



can be obtained from the work of Glad et al. (2003). 



Consider the following additional assumptions on (/i„)n>i, k and /* : 

A4. (i) hn = cn~°' with a G (0, 1/2) and c > a constant; (ii) k is a p.d.f. with bounded 
variations on M and a finite first order moment; (iii) the p.d.f. /* has bounded variations 
on M. 

The following result is proved in Appendix [C| 
Proposition 4.3. //7o ^ 7)°''^''" , and under Assumptions Al (i), A2, A3 and A4, 

sup|/.(t)-/(t)| ^0. 



Finally, note that, in all our numerical experiments, the kernel part of /„ was computed 



using the excellent ks R package (Duong, 2012) in which the univariate plug-in selector of 



Wand and Jones (1994) was used for the bandwidth hn- 



4.3 An unconditional weighted bootstrap for y/n{Fn — F) with ap- 
pUcation to confidence bands for F 

In applications, it may be of interest to carry out inference on F. The result stated in this 
section can be used for this purpose. It is based on the unconditional multiplier central limit 
theorem for empirical processes (see e.g. Kosorok, 2008, Theorem 10.1 and Corollary 10.3) 
and can be used to obtain approximate independent copies of y/n{Fn — F). 

Given i.i.d. random variables ^i, . . . with mean 0, variance 1, satisfying jQ°°{Pr(|^i| > 
x)Y^'^dx < oo, and independent of the random sample (Xj, Fj)i<j<„, let 

n 

where ^ = n'^ Yli=i^i- Also, let ^^^^J-Vt,, = " (^7„' ^7„' ^7„) ^"^y ^ ^ 



/I rj 



1 - vr. 



TTr, 



^2 ^fn 



be an estimated version of the influence function "ipf-yg arising in Proposition 
r7„ ' 



(a„,/3„) = (c/",/)(7„) and 7r„ = ^^(7„)- 



4.2 



let 

(14) 
where 
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The following proposition, proved in Appendix |D| suggests, when n is large, to interpret 
t H-> G^ipf as an independent copy of y/n{Fn — F). 



Proposition 4.4. Assume that^Q G T'"'^''^^ and that Assumptions Al, A2, A3 and A4 hold. 

Then, the process {t ^ Gn'ip[^^^,t G'^^ipf^^ ) converges weakly to {t h-)- Gip[^^^,t h-> G'lpf^^^) 
in {£°°(]R)}^, where t ^ G'lp^^^ is an independent copy of t ^ Gipf^^. 



Let us now explain how the latter result can be used in practice to obtain an approximate 
confidence band for F. Let be a large integer and let ^\^\ z G {1, . . . , n}, j G {1, . . . , A^}, be 
i.i.d. random variables with mean 0, variance 1, satisfying /Q°°{Pr(|^P''| > a;)}^/^dx < oo, and 
independent of the data (X^, Yi)i<i<n- For any j G {1, . . . , A^}, let G^f'^ = XJ!=Aii\- 
where C^^) = r^^Y^^^s!fl Then, a consequence of Propositions 

that 



4.2 



and 



4.4 



IS 



MFn -F),t^ ■ ■ ■ , i ^ ^^frj 



F 



in {£°°(M)}^+^, where G^^\ . . . ,G^^^ are independent copies of the P-Brownian bridge 
From the continuous mapping theorem, it follows that 



sup iV^iFn - sup \Gi'^[^^ I, . . . , sup |Gf ^V^f^ 

teM 

sup |G<^J,sup |G(^)<^J, • • • ,sup |G(^Vfy„ 



in [0, 00)^+"'^. The previous result suggests to estimate quantiles of sup^gjg \y/n{Fn — F)\ using 
the generalized inverse of the empirical c.d.f. 



1 " r 

GnA^) = -J2^\snp\Gli^[^J< 



X 



(15) 



A large-sample confidence band of level 1 — p for F is thus given by F„ ± (^"^(1 — p)/ \pn. 
Examples of such confidence bands are given in Figures [T] and [2| and the finite-sample prop- 
erties of the above construction are empirically investigated in Section [5j Note that in all our 
numerical experiments, the multipliers were taken from the standard normal distribution, 
and that the supremum in the previous display was replaced by a maximum over 100 points 
[/i, . . . , f/100 uniformly spaced over the interval [mini<j<„(Fj — — /3„Xj), maxi<j<„(Fj — 



Finally, notice that Proposition 4.4 also suggests to estimate the standard error of -F'„(t) 
for some fixed t G M by rr^l'^\^n{,i^ti )^}^^^- The finite-sample performance of this estimator 
is investigated in Section [5] for different values of t. 
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5 Monte Carlo experiments 



A large number of Monte Carlo experiments was carried out to investigate the influence on 
the estimators of various factors such as the degree of overlap of the mixed populations, 
the proportion of the unknown component ttq, or the shape of the noise e involved in the 
unknown regression model. Starting from ([T|, the following generic data generating models 
were considered: 

WO : ~ Ar(0, 1), («o, /3o) = (2, 1), X ~ N{2, 3^), ¥.{e'') = 1, 
MO : ~ Ar(0, 1), (ao, /So) = (2, 1), X ~ A/'(2, 3^), ¥.{e^) = 4, 
SO : ~ Ar(0, 1), (ao, /3o) = (1, 0.5), X ~ A/'(l, 2^), ¥.{e^) = 4. 

The abbreviations WO, MO and SO stand respectively for 'Weak Overlap", "Medium Over- 
lap" and "Strong Overlap" . Three possibilities were considered for the distribution of e: the 
centered normal (the corresponding data generating models will be abbreviated by WOn, 
MOn and SOn), a gamma distribution with shape parameter equal to two and rate parame- 
ter equal to a half shifted to have mean zero (the corresponding models will be abbreviated 
by WOg, MOg and SOg) and a standard exponential shifted to have mean zero (the cor- 
responding models will be abbreviated by WOe, MOe and SOe). Depending on the model 
they are used in, all three error distributions are scaled so that e has the desired variance. 

Examples of datasets generated from WOn, MOg and SOe with n = 500 and ttq = 0.7 are 
represented in the flrst column of graphs of Figure [l] The solid (resp. dashed) lines represent 
the true (resp. estimated) regression lines. The graphs of the second column represent, for 
each of WOn, MOg and SOe, the true c.d.f. F of e (solid line) and its estimate F„ (dashed 



line) deflned in (11). The dotted lines represent approximate confldence bands of level 0.95 



for F computed as explained in Subsection 4.3 with X = 10, 000. Finally, the graphs of the 



third column represent, for each of WOn, MOg and SOe, the true p.d.f. f oi e (solid line) 



and its estimate /„ (dashed line) deflned in (13). 



[Figure 1 about here.] 



For each of the three groups of data generating models, {WOn, MOn, SOn}, {WOg, 
MOg, SOg} and {WOe, MOe, SOe}, the values 0.4 and 0.7 were considered for ttq, and the 
values 100, 300, 1000 and 5000 were considered for n. For each of the nine data generating 
scenarios, each value of ttq, and each value of n, M = 1000 random samples were generated. 
Tables [T| [2] and [3] report the number m of samples out of M for which 7r„ ^ (0, 1], as well as 
the estimated bias and standard deviation of a„, f3n, vr„, F„{F~^(0.1)}, F„{F~^(0.5)} and 
Fn{F~^ (0.9)} computed from the M — m valid estimates. 



[Table 1 about here.] 



[Table 2 about here.] 
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[Table 3 about here.] 



A first general comment concerning the results reported in Tables [TJ [2] and [3] is that the 
number m of samples for which 7r„ ^ (0, 1] is the highest for the SO scenarios followed by 
the MO scenarios and then the WO scenarios. Also, for a fixed amount of overlap between 
the two mixed populations, it is when the distribution of e is exponential that m tends to 
be the highest followed by the gamma and the normal cases. Hence, as expected, the SO 
scenarios are the hardest and, for a given degree of overlap, the most difficult problems are 
those involving exponential errors for the unknown regression component. 

Influence of the shape of the p.d.f. of e. A surprising result, when observing Tables [lj|2] 
and [3} is that the nature of the distribution of e appears to have very little infiuence on the 
performance of the estimators f3n and 7r„. Under weak and moderate overlap in particular, 
the estimated bias and standard deviations of the estimators are almost unaffected by the 
distribution of the error of the unknown component. 

The effect of the degree of overlap. As expected, the performance of the estimators a„, 
f3n and 7r„ is strongly affected by the degree of overlap. Notice however that the results 
obtained under the WO and MO data generating scenarios are rather comparable, while the 
performance of the estimators gets significantly worse when switching to the SO scenarios, 
especially for 7r„. Notice also that, overall, the biases of a„ and /3„ are negative under WO 
and MO and positive under SO. 

The influence of ttq. For a given degree of overlap and sample size, the parameter that 
seems to affect the most the performance of the estimators is the proportion ttq of the 
unknown component. On one hand, the number of samples for which 7r„ ^ (0, 1] is lower 
for TTo = 0.4 than for ttq = 0.7. On the other hand, when considering the samples for which 
TTn G (0, 1], the finite-sample behavior of a„ and /3„ improves very clearly when ttq switches 
from 0.4 to 0.7. 

Performance of the functional estimator. The study of Fn{F^^{p)} for p G {0.1, 0.5, 0.9} 
clearly shows that, for a given degree of overlap between the two mixed population, the 
performance of the functional estimator is the best when the distribution of e is normal fol- 
lowed by the gamma and the exponential settings. In addition, it appears that 
p G {0.1,0.5}, behaves the best under the MO scenarios, and that, somehow surprisingly, 
F„{F~^(0.9)} achieves its best results under the SO scenarios. 

Asymptotics. The results reported in Tables [T| |2] and |3] are in accordance with the 
asymptotic theory stated in the previous section. In particular, as expected, the estimated 
biases and standard deviations of all the estimators tend to zero as n increases. Notice 
for instance that under SOg and SOe with ttq = 0.4 (two of the most difficult scenarios), 
the estimated standard deviation of a„ is greater than 7 for n = 100, drops below 0.7 for 
n = 300, and becomes very reasonable for n = 1000 and 5000. 

Let us now present the results of the Monte Carlo experiments used to investigate 
the finite-sample performance of the estimators of the standard errors of /3„, 7r„ and 



15 



F„{F ^{p)}, p G {0.1,0.5,0.9}, mentioned below Proposition 4.1 and at the end of Sub- 



section |4.3 respectively. The setting is the same as previously with the exception that 



n G {100,300, 1000,5000,25000}. The results are partially reported in Table |4] which gives, 
for scenarios WOn, MOg and SOe and each of the aforementioned estimators, the standard 
deviation of the estimates multiplied by ^/n and the mean of the estimated standard er- 
rors multiplied by ^/n. As can be seen, for all estimators and all scenarios, the standard 
deviation of the estimates and the mean of the estimated standard errors are always very 
close for n = 25,000. The convergence to zero of the difference between these two quan- 
tities appears however slower for Fn{F~^{p)}, p G {0.1,0.5,0.9}, than for an, Pn and vr^, 
the worse results being obtained for F„{F~^(0.1)}. The results also confirm that the SO 
scenarios are the hardest. Notice finally that the estimated standard errors of a„ and /3„ 
seem to underestimate on average the variability of «„ and /3„, and that the variability of 
7r„ and F„{F~^(p)}, p G {0.1, 0.5, 0.9} appears to be underestimated on average for the WO 
scenarios, and overestimated on average for the SO scenarios. 

[Table 4 about here.] 

We end this section by an investigation of the finite-sample properties of the confidence 



band construction proposed in Subsection |4.3[ Table [5] reports the proportion of samples for 
which 

max \Fnit) - F{t)\ > n-^/^G-^^{0.95), 

t£{Ui,...,Uioo} 



where Gn,N is defined as in (15) with = 1000, and f/i, . . . , f/„ are uniformly spaced over 
the interval [mini<j<„(yi — a„ — maxi<i<„(yi — — (3nXi)]. As could have been 

partly expected from the results reported in Table |4| the confidence bands are too narrow 
on average for the WO and MO scenarios, the worse results being obtained when the error 
of the unknown component is exponential. The results are, overall, more satisfactory for the 
SO scenarios. In all cases, the estimated coverage probability appears to converge to 0.95, 
although the convergence appears to be slow. 

[Table 5 about here.] 



6 Illustrations 



We first applied the proposed method to a dataset initially reported in Cohen (1980) and 



subsequently analyzed by De Veaux (1989) and Hunter and Young (2012), among others. 
The dataset consists of tt, = 150 observations where the Xi are actual tones and 

the jji are the corresponding perceived tones by a trained musician. To apply the proposed 
semiparametric approach, we make the assumption that the equation of the tilted component 
is y = X. Such an hypothesis seems to be in accordance with the detailed description of the 
dataset given in Hunter and Young (2012). The transformation t/i = yi — Xi was then applied 
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to obtain a dataset {xi,yi) that fits into the setting considered in this work. The original 
dataset and the transformed dataset are represented in the upper left and upper right plots 
of Figure [2] 



[Figure 2 about here.] 



The approach proposed in this paper was applied under the assumption that the distri- 
bution of e* in ([T]) is normal with standard deviation 0.079. The latter value was obtained by 
considering the upper right plot of Figure [2] and by computing the sample standard deviation 
of the Hi such that i/i G (—0.25, 0.25) and Xj < 1.75 or Xj > 2.25. 

The estimate (1.652, -0.817, 0.790) was obtained for (oq, Po, ^o) with (0.217, 0.108, 0.104) 
as vector of estimated standard errors. The corresponding estimated regression line is rep- 
resented by a solid line in the upper right plot of Figure [2] The estimate (F„ V 0) A 1 (resp. 
/„ V 0) of the unknown c.d.f. F (resp. p.d.f. /) of e is represented in the lower left (resp. 
right) plot of Figure |2} The dotted lines in the lower left plot represent an approximate con- 
fidence band of level 0.95 for F computed as explained in Subsection 43] using = 10, 000. 
Note that, from the results of the previous section, the later is probably too narrow. Nu- 



merical integration using the R function integrate ( R Development Core Team , 2012) gave 
J-iifn V 0) ~ 1.01. The results reported in Figure 2 suggest that a normal assumption for 
the error of the second component might not be appropriate. 

As a second application, we considered the NimbleGen high density array dataset ana- 



lyzed by Martin-Magniette et al. (2008). The dataset, produced by a two color ChlP-chip 
experiment, consists of n = 176,343 observations (xj,yj). A parametric mixture of linear 



regressions with two unknown components was fitted to the data by |Martin-Magniette et al. 
(2008) under the assumption of normal errors using an EM approach. More details can be 
found in Vandekerkhove (2012, Section 4.4). The latter author suggested to consider that 



the intercept and the slope of the first component were precisely estimated by the values 
1.47 and 0.82, respectively, obtained by Martin-Magniette et al. (2008), and applied the 
transformation i/i = jji — (1.47 -|- 0.82a:i) to obtain a dataset {xi, Ui) that fits into the setting 
considered in this work. The original dataset of Martin-Magniette et al. (2008) and the 



transformed dataset are represented in the upper left and upper right plots of Figure |3| 



[Figure 3 about here.] 



The approach proposed in this work was applied under the hypothesis that the distribu- 
tion of e* in ([T| is normal with standard deviation 0.492. The latter value comes from the 
consideration of the upper right plot of Figure [3] and is the sample standard deviation of the 
Hi for which Xj < 8.5 or Xj > 14. 

The estimate (0.483,0.075,0.351) was obtained for (ao,/3o,vro) with (0.037,0.002,0.008) 
as vector of estimated standard errors. The corresponding estimated regression line is rep- 
resented by a solid line in the upper right plot of Figure [3] while the dashed line represents 
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the (transformed) regression line estimated by Martin-Magniette et al. (2008) under the as- 
sumption of normal errors. The estimate (F„ V 0) A 1 (resp. fn V 0) of the unknown c.d.f. F 
(resp. p.d.f. /) of e is represented in the lower left (resp. right) plot of Figure IS] Numerical 



integration using the R function integrate gave /_g(/n V 0) ~ 1.03. The estimation of 
(ao, /So, vTo, /, F), implemented in R, took less than 30 seconds on one 2.4 GHz processor. 
The lower right plot of Figure [2] clearly confirms that a normal assumption for the error of 
the second component is not appropriate. 



7 Extension of the model and discussion 

From the two illustrations presented in the previous section, we see that the price to pay 
for no parametric constraints on the second component is a complete specification of the 
first component. As mentioned in Section |2| from a theoretical perspective, it is possible to 
improve this situation by introducing an unknown scale parameter for the first component. 
Using the notation previously defined, the extended model that we have in mind can be 
written as 

\ ao + PoX + € if Z = l, ^^"^ 

where e* is assumed to have variance one and known c.d.f. F while cTq is unknown. With 
respect to the model given in ([T]), this simply amounts to writing e* as aQE* and the c.d.f. F* 
of e* as F* = F{-/aQ). The Euclidean parameter vector of this extended model is therefore 
(ao, /3o, TTo, o"o) and the functional parameter is F, the c.d.f. of e. 



The model given in ( 16 ) is identifiable provided X, the set of possible values of X, contains 
four points xi,X2,X3,X4 such that the vectors {(1, Xj, , x^)}i<j<4 are linearly independent. 
This can be verified by using, in addition to ^ and ([6]), the fact that 

E{Y^\X) = TToaoial + Sa^) + 37ro/3o(«o + ^o)^ + ^T^oao^o^^ + t^oP^X^ a.s. (17) 
By proceeding as in Section [3} one can for instance show that 

*x2 70,370,5 - 70,770,2 /TON 



70,5 - 7o,2 



where 70,2 is the coefficient of X in ([5]), 70,3 and 70,5 are the coefficients of 1 and X^, 
respectively, in (^, and 70,7 is the coefficient of X^ in (17). 



From a practical perspective however, using relationship (18) for estimation (or a similar 
equation resulting from (|5|, ^ and (17)) turned out to be highly unstable. The reason 
why estimation of ctq by the moment method does not work satisfactorily seems to be due 
to the fact that (ctq)^ is always the difference of two positive quantities. The estimation 
of each quantity is not precise enough to ensure that their difference is close to (ctq)^, and 
the difference is often negative. As an alternative estimation method, an iterative EM-type 
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algorithm could be used to estimate all the unknown parameters of the extended model. 
Unfortunately, a weakness of such algorithms is that, up to now, the asymptotics of the 
resulting estimators are not known. 



A Proof of Proposition 4.1 



Proof. Let us prove (i). From Assumption Al (i) and Q, we have that E,{X^Y'^) is finite 
for all integers p,q E {0, 1,2}. It follows that all the components of the vector of expecta- 
tions E,{(p~f^{X,Y)} = are finite. The strong law of large numbers then implies that 
IPn07(, — > -P07o- Using the fact that 7q is a zero of 7 Pcp-y, that Pn07o = TnTo ~ ^n, and 
that Pn07„ = TnTn ~ = 0, wc obtain that Tni'Jn "~ 7o) 0- The strong law of large 
numbers also implies that r„ Fq. Matrix inversion being continuous with respect to any 



usual topology on the space of square matrices. Assumption A2 imphes that F„^ F ^ 







The continuous mapping theorem then implies that F~^F„(7„ — 7q) = 7„ — 7o — > 0. 
Since 7q G 'D"'^''^, the strong consistency of 7r„) is finally again a consequence of the 

continuous mapping theorem as the function 

7^ (^?",/,^7")(7) = («,/5,vr) (19) 

from to M.^ is continuous on J)"'^''^ . 

Let us now prove (ii). Using the fact that -P07o = and Pn07„ = 0; 'we have 

P„07o - P07O = -(Pn07„ - Pn07o) = -^"(V^Tn " ^^70) = nil n " 7o). 



which implies that Qn^iQ = ~rnv^(7n ~ 7o)- From Assumption Al (ii) and (|4|), we 
have that the covariance matrix of the random vector ^-^^^(X, y) is finite. The multi- 
variate central limit theorem then implies that 'GIn'P-1^■, converges in distribution to a cen- 
tered multivariate normal random vector G^-y^^ with covariance matrix Pip~^^^(p^^. Since 
(G„0-yg, F„) -w (G0-yjj, Fq) and under Assumption A2, we obtain, from the continuous map- 
ping theorem, that 

n{ln - 7o) = -rn^G„V37o ^ -^Q^'^Vlo' 



The map defined in (19) is differentiable at 7q since 7o € D"'^''^. We can thus apply the 



delta method with that map to obtain that 

Since F~^ Fq ^ under Assumption A2, we obtain that 



It remains to prove that S„ S. Under Assumption Al (ii), the strong law of large 
numbers implies that ^n^^^iji^^^ Pip-^^if)^^. The fact that P„(y9-y^(y9^ = ¥n^-^^^^~^^ + 
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Fni'^-y^'fj — (p-^^^ipl^^) Pipj^ipl^^ is then a consequence of the fact that 7„ —4 7q and 
the continuous mapping theorem. Similarly, since 7o ^ "D"'^-''', we additionally have that 
Combined with the fact that, under Assumption A2, — > Fq ^, we obtain 
that T,n — > S from the continuous mapping theorem. □ 



B Proof of Proposition 4.2 



The proof of Proposition 4.2 is based on three lemmas 



Lemma B.l. The classes of functions and are P-Donsker. So is the class J^'^^^^'^ 
provided Assumptions Al (ii) and A2 hold, and 7q G P"'^''^. 

Proof. The class J^^ is the class of indicator functions (x,?/) G Cf.^}, where 

Ct,n = {{x^y) e : y < t + a + fix}. The collection C = {Ct,^ : t € ^ = («,/?) G M^} 
is the set of all half-spaces in M^. From van der Vaart and Wellner (2000, Exercise 14, p 
152), it is a VC class with VC dimension 4. By Lemma 9.8 of Kosorok (2008), J^"^ has the 



same VC dimension as C. Being a set of indicator functions, J-'"' clearly possesses a square 
integrable envelope function and is therefore P-Donsker. 

The class J-"^ is a collection of monotone functions, and it is easy to verify that it has VC 
dimension 1. Furthermore, it clearly possesses a square integrable envelope function because 
the elements of J-'^ are bounded. It is therefore P-Donsker. 

The components classes of class are well defined since Assumption A2 holds and 

7o ^ D"'^''^. It is easy to see that they are linear combinations of a finite collection of 
functions that, from Assumption Al (ii), is P-Donsker. The components classes of J^'^'P^'^ 
are therefore VC classes. They possess square integrable envelope functions because 'D'^'^''^ 
is a bounded set. The class J^^'P^'^ is therefore P-Donsker. □ 



Lemma B.2. Under Assumptions Al (i) and A3 (i), 



supP(^4-^i;^j2^o 



and 



as 



Proof. For class J-""^, for any t G M, we have 



--P{i^t,r, - V'^!r,o)l(ao + Pox<a + Px)} + Pii^Plr,, " ^DHc^o + /^qX > a + /3x)} 

= / {FY\x(t + a + f3x\x) - FY\x(t + Oo + f3ox\x)] l{ao + f3oX < a + f3x)dFx{x) 
Jr 

+ / {PY\xit + ao + f3ox\x) - Fyixit + a + f3x\x)} l(ao + f^ox > a + (3x)dFx{x) 
Jm. 



< / \FY\x{t + ao + f3Qx\x) - Fyixit + a + /3x\x)\dFx{x), 
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where Fy\x is defined in Since fY\x{-\x) defined in ([3]) exists for all x G A", the mean 
value theorem enables us to write, for any t G M and a; G Af, 

Fy|x(t + a + /3x|x) -Fy|x(t + ao + /3oa;|a;) = fY\x{t + oi^,t + Px,tx\^) {(" " "o) + - /3o)} , 
where ax,t + Px,tx is between a + /3a; and ao + /3oa;. It follows that 

supP(V'j'^^ - < sup / fY\x{t + a^,t + /3x,ta;|x) |(a - Uq) + - /3o)| dFx(x) 



< sup rit) + sup fit) {|a - aol + - /3o|} . 

Under Assumption A3 (i), the supremum on the right of the previous display is finite and, 
under Assumption Al (i), so is E(|X|). We therefore obtain the desired result. 

For class J-"^, we have 

supP(7/>J, - )2 = / {F*{t + a + l3x)- F*{t + + MYdFx{x) 

< f + a + -F*(t + ao + /3oa;)|dFx(a;), 

from the convexity oi x ^ on [0, 1]. Proceeding as previously, by the mean value theorem, 
we obtain that 

supP«^ - < /supr(t)l {|a - aol + E(|X|)|/3 - /3o|} . 

Under Assumptions Al (i) and A3 (i), the right-hand side of the previous inequality tends 
to zero as r7 — )■ tJq. □ 

Lemma B.3. Under Assumptions Al (ii), A2 and A3 (ii), for any t G M, 

= G„ (^4^ + [(1 - no)E{f*{t + ao + (3oX)} + 7ro/(t)] 

+ [(1 - 7ro)E{Xr (t + ao + /3oX)} + 7ro/(t)E(X)] ^^^) + 

and 

v^{K„(r7„,t) - K(77o,t)} = (Pn<^„ - 

= G„ (V^g,^ + E{/*(t + «o + /3oX)}^^^ + E{X/*(t + «o + /3oX)}^^J + i?^ 

w/iere sup^gK iRLl ->p one? sup^giR \R^J^pO. 



K 
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Proof. We only prove the first statement as the proof of the second statement is similar. We 
have 



Using the fact that rj^ r/g, Lemma 



B.l 



and Lemma 



B.2 



we can apply Theorem 2.1 in 



van der Vaart and Wellner (2007) to obtain that 



sup 



Furthermore, for any t G M, we have 



v^P fer, - i^ir,}\ = I {FY\x{t + + Mx) - FY\x{t + + Mx)] dFx(x), 

where is defined in (j2j). Since /y|j^(-|a;), the derivative of fY\x{-\x), exists for all x G 
from Assumption A3 (ii) and (|3]), we can apply the second-order mean value theorem to 
obtain 

^/nP (ipl^ - tpl ) =V^ fY\x{t + ao + /3ox\x){{an - ao) + (/3n - /3o)x}dFxix) + R^t, 



where 



R: 



n,t 



frixii + o;x,t,n + (3x,t,nx\x){{an - ao) + i(3n - (3o)xYdFxix), 



and ax,t,n + l^x,t,nX is between + (3qX and a„ + /3„x. Now, from ([3j), 

supK.I < v^(sup(r)'(t) + sup/'(t)l 

X - ao)' + - /3o)'E(X2) + 2|a„ - aoPn - /3o|E(|X|)} . 

The supremum on the right of the previous inequality is finite from Assumption A3 (ii), and 
so are E(|X|) and E(X^) from Assumption Al (ii). Furthermore, under Assumptions Al (ii) 



and A2, we know from Proposition 4.1 that ^/n[an — ao, P>n ~ Po) converges in distribution 
while (a„,/3„) — > {ao,f3o). It follows that sup^gj^ iR^tl 0- Hence, we obtain that 



(^-tPl^^ - i^l^^ = E{/y|x(t + «o + (3oX\X)}^{an - «o) 

+ E{X/y|x(t + «o + /3oX|X)}v/^(/3„ - /3o) + R 



J 

'n,ti 



t e 



The desired result finally follows from the expression of /yix given in and Proposition 4.1 



□ 
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Proof of Proposition 4.2, Under Assumptions Al (ii) and A2, and since 7q G 'D°''^''^, we 
know, from Lemma B.l tliat ttie classes J-'"', J-"^ and J^"'^''^ are P-Donsker. It follows that 



t ^ Gnij^\^,t ^ G„^,\, 

converges weakly in {£°°(]R)}^ x M'^. Assumption A3 (i) then implies that the functions 
t ^ E{fYix{t + «o + PoX\X)}, t ^ E{X/y|x(t + «o + /3o^|X)}, t ^ E{r (t + «o + /3oX)}, 
and if: E{X/*(t + ao + /3o-^)} are bounded. By the continuous mapping theorem, we thus 
obtain that 

I t^Gn {ipir,, + mnxit + ao + PoX\X)}^l + E{XfY\x{t + ao + (3oX\X)}^^^^ ^ 



V 



converges weakly in {£°°(]R)}^ x M. It follows from Proposition 4.1 and Lemma 



B.3 



that 



converges weakly in {£°°(]R)}^ x M. The desired result is finally a consequence of (11) and 
the functional delta method applied with the map (J, K, tt) i— ^ { J — (1 — 7c)K} /tt. □ 



C Proof of Proposition 4.3 



Proof. The assumptions of Proposition |4.1 



being verified, we have that 7r„ ttq 7^ 0. Then, 



as can be verified from (13), to show the desired result, it suffices to show that 

1 ft-Yi + an + l^nXi \ (1 - TTo) 

— > K ' 



sup 



1=1 



J2f*(^ + ^n + l3nX,)-nofit) 



1=1 



The previous supremum is smaller than J„ + (1 — 7ro)//n, where 

1 f t-Yi + an + l^nX; 



In = sup 

and 



nhr, 



i=l 



(l-vTo) / f*{t + ao + /3ox)fx{x)dx - TTof{t) 



//„ = sup 



1 " f 

- Vr(t + «„ + /3„X,)- / r{t + ao + Mfxix)dx 



i=l 
a.s 



Let us first show that /„ — ^ 0. Consider the class J-' of measurable functions from to 
defined by 

= l{x,y) ^ ^r,,t,h{x) = K r ~ ^ ^ ^"" j ■.r] = {a,/3) eR^te R, he {0,00] 
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and notice that 



1 " 

- 

n ^ 



t-Yi + an + (3nXi 
h„ 



t e 



where rj^ = Then, In<I'n + where 



(20) 



and 



with 



t'i 

I„ = sup 



-P^v„,t,h„ - git) 



git) = (1 - TTo) / r (t + ao + Mfxix)dx + 7ro/(t), 
Let us first deal with /^'. From ([s]), notice that 

= / fY\x{t + aQ + [5Qx\x)fxix)(lx, te 

Also, for any t G M, 

't-y + an + f3nX 



t e 



K 



hr 



fY\xiy\x)dy \ fxix)dx 



which, using the change of variable u = {t — y + an + Pnx)/hn in the inner integral, can be 
rewritten as 



Pi>r,„,t,h„ = hn / S / K{u)fY\xit + an + l3nX - Uhn\x)du> fxix)dx. 

Since k is a p.d.f. from Assumption A4 (ii), it follows that, for any t G M, 
1 



hn 



P^ri„,t,hr. - git) 




f^iu) {fvixit + a„ + (3nX - uhn\x) - f^xit + «o + Mx)} du 



fxix)dx. 



As fY\xi'\^)y derivative of fY\xi-\x), exists for all x G A:' under Assumption A3 (ii), the 
mean value theorem enables us to write 



/;'< (sup(r)'(t) + sup/(t)l [\[ ^ 

L tm teR ) Jr Ur 



ziu) {\an - ao\ + \Pn - Po\\x\ + \u\hn}du 



fxix)dx. 



Hence, 



|sup(r)'(t) + sup/(t)l||a„-ao| + |/3n-/3o|E(|X|) + /i„ / \u\Kiu)du\ , 
I tm teR ) I Jr ) 
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which, from Assumptions Al (i), A3 (ii), A4 (ii), and Proposition 4.1 (i), imphes that 
0. 



Let us now show that 0. Since k has bounded variations from Assumption A4 (ii), 

it can be written as Ki — k,2, where both Ki and k,2 are bounded non decreasing functions 
on M. Without loss of generahty, we shall assume that k, ki and k,2 are bounded by 1. Then, 
for j = 1,2, we define 



[x, y) Kj 



t — y + a + (3x 
h 



: ia,l3,t) G M^/^ G (0,cx)) 



Proceeding as in Nolan and Pollard (1987, proof of Lemma 22), let us first show that J-'j is 
a VC class for j = 1,2. Let kJ be the generalized inverse of defined by nj{c) = mf{x G 
M : Kj{x) > c}, c G M. We consider the partition {Ci,C2} of M defined by 

(kJ(c), oo) if c G Ci, 
[kJ{c), oo) if c G C2. 



G M : Kj{x) > c} 
Given {a,l3,t) G and h G (0,oo), the set 

{x,y,c) G 

can therefore be written as the union of 

{{x,y,c) eR^ xCi:t-y + a + /3x- hK~{c) > O} 



t — y + a + f3x 
h 



> c 



(21) 



and 



{{x,y,c) G X C2 : t - y + a + /3a; - hKj{c) > O} . 



Now, let fa,i3,t,h{x, y,c) = t — y + a + f3x — hn, (c). The functions fa,f3,t,h, with (a, (3, t) G 



and h G (0, cxd), span a finite-dimensional vector space. Hence, from Lemma 18 (ii) in Nolan 
and Pollard (1987), the collections of all sets {(x, ?/,c) G x Ci : fa,i3,t,h{x,y,c) > 0} and 



{{x,y,c) G X C2 ■ fa,i3,t,hix,y,c) > 0} are VC classes. It follows that the collection of 
subgraphs of J-'j defined by (21), and indexed by {a,f3,t) G M.^ and h G (0, 00), is also VC, 
which implies that J^j is a VC class of functions. 

Given a probability distribution Q on M^, recall that L2{Q) is the norm defined by 
{QPY^"^, with / a measurable function from to M. Given a class Q of measurable functions 
from ]R2 to M, the covering number N{e,Q, L2{Q)) is the minimal number of L2(Q)-balls of 
radius e > needed to cover the set Q. From Lemma 16 in Nolan and Pollard] (1987), since 
T = T\ — T21 and since T\ and J-2 have for envelope the constant function 1 on M^, we have 

supAr(2e,J^,L2(Q)) < supA^(e, J'i,L2(Q)) x sup Ar(e, J-2, ^2(Q)), 

Q Q Q 

for probability measures Q on M^. Using the fact that both T\ and T2 are VC classes of 
functions with constant envelope 1, from Theorem 2.6.7 in van der Vaart and Wellner (2000) 
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(see also the discussion on the top of page 246), we obtain that there exist constants m > 
and f > that depend on T\ and J-2 such that 



-) , 



for every < e < m. 



Then, by Theorem 2.14.9 in van der Vaart and Wellner (2000), there exists constants ci > 



and C2 > such that, for every e > 0, 



Pr* ( sup >e\< Cis"^ exp(-2e2). 



Starting from (20), we thus obtain that, for every e > 0, 



Pr*(/^ > ^) = Pr* ( sup > V^Ke I 



< Pr* ( sup |G„/| > y/nhnE < ci^y/nhneY"^ exp{—2nh'^^e'^) = a„. 

From Assumption A4 (i), it can be verified that an+i/dn — ^ 1 and that n{an+i/an — l) — )■ — oo. 
It follows from Raabe's rule that the series with general term converges. The Borel- 



Cantelli lemma enables us to conclude that I' —4 0, and we therefore obtain that /„ 



0. 



Since /* has bounded variations from Assumption A4 (iii), one can proceed along the 
same lines to show that //„ 0. □ 



D Proof of Proposition 4.4 



The proof of Proposition 4.4 is based on the following lemma. 



Lemma D.l. Let C and Hq C M'^ for some integers p,q > 0, let T = {/e,!^ : G 



Q , ( ^ Hq} be a class of measurable functions from to M 
Co £ Hq such that Pr(C„ e Hq) — 1. If is P-Donsker and 



and let Cn be an estimator of 



supP(/e,c„ - fe,(o] 



then, 



sup \G'^ife^,( 
6»e0 



fi 



9,(0) 



0, 



Proof. The result is the analogue of Theorem 2.1 of van der Vaart and Wellner (2007) in 
which Gn is replaced by 
in 



. The proof of Theorem 2.1 relies on the fact that Gn ^ 
'(J-") and on the uniform continuity of the sample paths of the P-Brownian bridge 



see van der Vaart (1998, proof of Theorem 19.26) and van der Vaart (2002). From the 
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functional multiplier central limit theorem (see e.g. Kosorok, 2008, Theorem 10.1), we know 
that converges weakly in to (G,G'), where G' is an independent copy of 

the G. The desired result therefore follows from a straightforward adaptation of the proof 
of Theorem 2.1 of van der Vaart and Wellner (2007). □ 



Proof of Proposition 4.4 Since Assumptions Al (ii) and A2 hold, we have from Lemma B.l 
that J-'"', J^^ and J^"'^''^ are P-Donsker. Furthermore, E(X) is finite from Assumption Al (i), 
the function / is bounded from Assumption A3 (i), and so is the function 1 1— )■ P{%ljf,^^—%lj(,^^) 
from the definitions of J and K given in ([sj) a nd ([9]). Hence, from the functional multiplier 
central limit theorem (see e.g. Kosorok, 2008, Theorem 10.1) and the continuous mapping 
theorem, we obtain that 



m 



t h-)> Gipf^^. It remains to show that 



)Y, where ip^^^ is defined in (12) and t i— )■ G'lpf^^ is an independent copy of 



sup 



0. 



From (12) and (14), for any t G M, we can write 

1 



.70 



< 



^70 



+ 



+ 



;(A(t)X7A?„-/(t)E(X)^, 



7o 



+ 



7o 



^0 



7r„ VTo 



.^0 



7o 



(22) 



The last absolute value on the right of the previous display is smaller than 



^n^7o 



+ 



VTn 



7o 



(23) 



Now, 



sup 



+ n sup 



+ sup 



^K„-<.„-^$,o+<)|- (24) 



Applying the mean value theorem as in the proof of Lemma B.2, we obtain that 



sup 







as 
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which, combined with the fact that rj^ tjq imphes that the last term on the right 



of (24) converges to zero in probabihty. From Lemma B.2 and Theorem 2.1 of van der Vaart 



and Wellner (2007), we obtain that the first term on the right of (24) converges to zero in 



probabihty. The second term on the right of (24) converges to zero in probabihty because the 



classes and J-"^ are P-Donsker. The convergence to zero in probability of the term on the 
left of (24) combined with the fact that 7r„ ttq and that jG^?/'^^! is bounded in probability 
implies tnat the first product in (23) converges to zero in probability uniformly in t G M. 
Furthermore, J-"" 

that G'„(7/;:; 



''^ being P-Donsker, and since P||^l/ 
Assumptions Al (ii) and A2, we have from Lemma 



D.l 



^7oroV7ol 



' — i-p under 
— j-p 0, which 



implies that the second product in (23) converges to zero in probability uniformly in t G 



One can similarly show that the other terms on the right of (22) converge to zero in 
probability uniformly in t G M using, among other arguments, the fact that, from Lemma [D.l 



sup 



sup 



K 



andG;(<-0 



converge to zero in probability, as well as sup^gjg |/„(t) 



Proposition 4.3 are satisfied 



f(t)\ since the assumptions of 

□ 
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Figure 1: First column, from top to bottom: datasets generated from WOn, MOg and SOe, 
respectively, with n = 500 and ttq = 0.7; the solid (resp. dashed) lines represent the true 
(resp. estimated) regression lines. Second column, from top to bottom: for WOn, MOg 
and SOe, respectively, the true c.d.f. F of e (solid line) and its estimate F„ (dashed line) 
defined in (11). The dotted lines represent approximate confidence bands of level 0.95 for 
F computed as explained in Subsection 4.3 with N = 10, 000. Third column, from top to 
bottom: for WOn, MOg and SOe, respectively, the true p.d.f. f of e (solid line) and its 
estimate /„ defined in (13) (dashed line). 
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Figure 2: Upper left plot: the original tone data. Upper right plot: the transformed data; the 
solid line represents the estimated regression line. Lower left plot: the estimate (F„ V 0) A 1 
(solid line) of the unknown c.d.f. F of e as well as well as an approximate confidence band 
(dotted lines) of level 0.95 for F computed as explained in Subsection 4.3 with N = 10, 000. 
Lower right plot: the estimate V of the unknown p.d.f. / of e. 
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~i 1 1 r 

10 12 14 16 






Figure 3: Upper left plot: the original ChlPmix data analyzed by Martin-Magniette et al. 



(2008). Upper right plot: the ChlPmix data transformed as in Vandekerkhove (2012); the 



solid line represents the regression line estimated by the method in this work, while the 



dashed line is the regression line estimated by Martin-Magniette et al. (2008). Lower left 



plot: the estimate (-F„ V 0) A 1 of the unknown c.d.f. F of e. Lower right plot: the estimate 
/„ V of the unknown p.d.f. / of e. 
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Table 5: For M = 1000 random samples generated under each of the nine scenarios consid- 
ered in Section |5| number m of samples out of M for which 7r„ ^ (0, 1], and proportion p out 
of the M — m remaining samples for which F„ is not in the approximate confidence band 



computed as explained in Subsection 4.3 
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